40 research outputs found
Deriving Local Internal Logic for Black Box Models
Despite the widespread use, machine learning methods produce black box models. It is hard to understand how features influence the model prediction. We propose a novel explanation method that explains the predictions of any classifier by analyzing the prediction change obtained by omitting relevant subsets of attribute values. The local internal logic is captured by learning a local model in the neighborhood of the prediction to explain. The explanations provided by our method are effective in detecting associations among attributes and class label
PoliTeam @ AMI: Improving Sentence Embedding Similaritywith Misogyny Lexicons for Automatic Misogyny Identificationin Italian Tweets
en We present a multi-agent classification solution for identifying misogynous and aggressive content in Italian tweets. A first agent uses modern Sentence Embedding techniques to encode tweets and a SVM classifier to produce initial labels. A second agent, based on TF-IDF and Misogyny Italian lexicons, is jointly adopted to improve the first agent on uncertain predictions. We evaluate our approach in the Automatic Misogyny Identification Shared Task of the EVALITA 2020 campaign. Results show that TF-IDF and lexicons effectively improve the supervised agent trained on sentence embeddings.Presentiamo un classificatore multi-agente per identificare tweet italiani misogini e aggressivi. Un primo agente codifica i tweet con Sentence Embedding e una SVM per produrre le etichette iniziali. Un secondo agente, basato su TF-IDF e lessici misogini, è usato per coadiuvare il primo agente nelle predizioni incerte. Applichiamo la soluzione al task AMI della campagna EVALITA 2020. I risultati mostrano che TF-IDF e i lessici migliorano le performance del primo agente addestrato su sentence embedding
Enhancing Interpretability of Black Box Models by means of Local Rules
We propose a novel rule-based method that explains the prediction of any classifier on a specific instance by analyzing the joint effect of feature subsets on the classifier prediction. The relevant subsets are identified by learning a local rule-based model in the neighborhood of the prediction to explain. While local rules give a qualitative insight of the local behavior, their relevance is quantified by using the concept of prediction differenc
Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence
Machine learning models may perform differently on different data subgroups, which we represent as itemsets (i.e., conjunctions of simple predicates). The identification of these critical data subgroups plays an important role in many applications, for example model validation and testing, or evaluation of model fairness. Typically, domain expert help is required to identify relevant (or sensitive) subgroups.
We propose the notion of divergence over itemsets as a measure of different classification behavior on data subgroups, and the use of frequent pattern mining techniques for their identification.
A quantification of the contribution of different attribute values to divergence, based on the mathematical foundations provided by Shapley values, allows us to identify both critical and peculiar behaviors of attributes.
Extended experiments show the effectiveness of the approach in identifying critical subgroup behaviors
Semantic Image Collection Summarization with Frequent Subgraph Mining
Applications such as providing a preview of personal albums (e.g., Google Photos) or suggesting thematic collections based on user interests (e.g., Pinterest) require a semantically-enriched image representation, which should be more informative with respect to simple low-level visual features and image tags. To this aim, we propose an image collection summarization technique based on frequent subgraph mining. We represent images with a novel type of scene graphs including fine-grained relationship types between objects. These scene graphs are automatically derived by our method. The resulting summary consists of a set of frequent subgraphs describing the underlying patterns of the image dataset. Our results are interpretable and provide more powerful semantic information with respect to previous techniques, in which the summary is a subset of the collection in terms of images or image patches. The experimental evaluation shows that the proposed technique yields non-redundant summaries, with a high diversity of the discovered patterns
A Hierarchical Approach to Anomalous Subgroup Discovery
Understanding peculiar and anomalous behavior of machine learning models for specific data subgroups is a fundamental building block of model performance and fairness evaluation. The analysis of these data subgroups can provide useful insights into model inner working and highlight its potentially discriminatory behavior. Current approaches to subgroup exploration ignore the presence of hierarchies in the data, and
can only be applied to discretized attributes. The discretization process required for continuous attributes may significantly affect the identification of relevant subgroups.
We propose a hierarchical subgroup exploration technique to identify anomalous subgroup behavior at multiple granularity levels, along with a technique for the hierarchical discretization of data attributes. The hierarchical discretization produces, for each continuous attribute, a hierarchy of intervals. The subsequent hierarchical exploration can exploit data hierarchies, selecting for each attribute the optimal granularity to identify subgroups that are both anomalous, and with enough elements to be statistically and practically significant. Compared to nonhierarchical approaches, we show that our hierarchical approach is more powerful in identifying anomalous subgroups and more stable with respect to discretization and exploration parameters
ferret: a Framework for Benchmarking Explainers on Transformers
Many interpretability tools allow practitioners and researchers to explain
Natural Language Processing systems. However, each tool requires different
configurations and provides explanations in different forms, hindering the
possibility of assessing and comparing them. A principled, unified evaluation
benchmark will guide the users through the central question: which explanation
method is more reliable for my use case? We introduce ferret, an easy-to-use,
extensible Python library to explain Transformer-based models integrated with
the Hugging Face Hub. It offers a unified benchmarking suite to test and
compare a wide range of state-of-the-art explainers on any text or
interpretability corpora. In addition, ferret provides convenient programming
abstractions to foster the introduction of new explanation methods, datasets,
or evaluation metrics